Search CORE

14 research outputs found

wav2letter++: The Fastest Open-source Speech Recognition System

Author: Cai Jeff
Collobert Ronan
Hannun Awni
Kahn Jacob
Liptchinsky Vitaliy
Pratap Vineel
Synnaeve Gabriel
Xu Qiantong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/12/2018
Field of study

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks

arXiv.org e-Print Archive

Crossref

Libri-Light: A Benchmark for ASR with Limited or No Supervision

Author: Collobert Ronan
Dupoux Emmanuel
Fuegen Christian
Joulin Armand
Kahn Jacob
Karadayi Julien
Kharitonov Evgeny
Likhomanenko Tatiana
Liptchinsky Vitaliy
Mazaré Pierre-Emmanuel
Mohamed Abdelrahman
Rivière Morgane
Synnaeve Gabriel
Xu Qiantong
Zheng Weiyi
Publication venue: HAL CCSD
Publication date: 20/12/2019
Field of study

We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art

LIBRI-LIGHT: a benchmark for asr with limited or no supervision

Author: Abdelrahman Mohamed,
Collobert Ronan
Dupoux Emmanuel
Fügen Christian
Joulin Armand
Kahn Jacob
Karadayi Julien
Kharitonov Eugene
Likhomanenko Tatiana
Liptchinsky Vitaliy
Mazaré Pierre-Emmanuel
Rivière Morgane
Synnaeve Gabriel
Xu Qiantong
Zheng Weiyi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2020
Field of study

International audienceWe introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio , which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art. Index Terms-unsupervised and semi-supervised learning , distant supervision, dataset, zero-and low resource ASR

Crossref

INRIA a CCSD electronic archive server

Twenty years of coordination technologies: State-of-the-art and perspectives

Author: A Arsénio
A Omicini
A Omicini
A Ricci
AIT Rowstron
AL Murphy
Alevtina Dubovitskaya
Antony Rowstron
C Baier
Carlos Varela
Chien-Liang Fok
Ciárjn Bryce
D Fensel
D Gelernter
Davide Rossi
Davide Rossi
E Denti
Enrico Denti
F Zambonelli
Farhad Arbab
Farhad Arbab
Franz Achermann
G Ciatto
H Plociniczak
Iain Merrick
J Heuer
J Proença
J-P Banătre
JA Hendler
Juan Carlos Cruz
K Honda
Koen Bosschere
L Atzori
L Bettini
L Bettini
L Bettini
Lorenzo Bettini
Lorenzo Bettini
M Louvel
M Louvel
M Viroli
Marco Cremonini
Marina Andrić
Mario Banville
Michael Schumacher
Munehiro Fukuda
N Kokash
Neal Sample
NH Minsky
Nicholas Ng
P Ciancarini
Paul Tarau
R Hu
RH Bordini
Robert Tolksdorf
Robert Tolksdorf
Robert Tolksdorf
S Gilmore
S Mariani
S Mariani
SSTQ Jongmans
Stijn Mostinckx
Stéphane Ducasse
Suresh Jagannathan
Vitaliy Liptchinsky
Wilfred C. Jamison
Xuhui Ao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Since complexity of inter- and intra-systems interactions is steadily increasing in modern application scenarios (e.g., the IoT), coordination technologies are required to take a crucial step towards maturity. In this paper we look back at the history of the COORDINATION conference in order to shed light on the current status of the coordination technologies there proposed throughout the years, in an attempt to understand success stories, limitations, and possibly reveal the gap between actual technologies, theoretical models, and novel application needs

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Hal-Diderot

Modeling and programming social collaboration

Author: Liptchinsky Vitaliy
Publication venue: Wien
Publication date: 01/11/2016
Field of study

Abweichender Titel nach Übersetzung der Verfasserin/des VerfassersA whole is greater than the sum of its parts. A collaborating team is greater than a group of contributors working in isolation. In this thesis we introduce a novel technique called collaboration-assisted computation that evolves human-assisted computation in line with these postulates. As human computation focuses on integrating human input at various phases of machine computation, so collaboration-assisted computation aims at integrating machine computation with input from collaborating teams. However, collaboration-assisted computation is something more than a simple replacement of the term human input with the term team input in the pipeline of machine computation. What is collaboration without social interaction? How effective can collaboration be without convenient software tools? While the answers to these questions lie outside of the scope of this thesis, we argue that a truly efficient collaboration orbits around social context and collaborative software. Therefore, the center of gravity for collaboration-assisted computation lies at the intersection of human computation, social computing and collaborative software. Moreover, collaboration-assisted computing relies on crowdsourcing to execute collaboration at massive scale. Hence, this thesis presents a holistic framework for modeling and programming collaboration-assisted computation. First, we present a query language capable to express intuitively complex social traits of collaborating groups. Second, we show how to model social collaboration processes. Third, the thesis introduces a programming language to coordinate collaborative teams and a framework for integration of social and collaborative software. Fourth, we show how crowdsourcing models can be extended to scale collaboration processes. The proposed modeling and programming languages were evaluated with extensive use cases, showing intuitiveness and expressiveness of each of the approaches.10

reposiTUm